Search CORE

4,392 research outputs found

Application Performance of Physical System Simulations

Author: Conte Thomas M.
Conte Thomas M.
Getov Vladimir
Getov Vladimir
Kogge Peter M.
Kogge Peter M.
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

Various parallel computer benchmarking projects have been around since early 1990s but the adopted so far approaches for performance analysis require a significant revision in view of the recent developments of both the application domain and the computer technologies. This paper presents a novel performance evaluation methodology based on assessing the processing rate of two orthogonal use cases – dense and sparse physical systems – as well as the energy efficiency for both. Evaluation results with two popular codes — HPL and HPCG — validate our approach and demonstrate its use for analysis and interpretation in order to identify and confirm current technological challenges as well as to track and roadmap the future application performance of physical system simulations

WestminsterResearch

Scavenging, cycling and removal fluxes of 210Po and 210Pb at the Bermuda time-series study site

Author: Baskaran Mark
Church Thomas M.
Conte Maureen H.
Hong G. H.
Publication venue: 'Elsevier BV'
Publication date: 14/11/2012
Field of study

Author Posting. © The Author(s), 2012. This is the author's version of the work. It is posted here by permission of Elsevier B.V. for personal use, not for redistribution. The definitive version was published in Deep Sea Research Part II: Topical Studies in Oceanography 93 (2013):108–118, doi:10.1016/j.dsr2.2013.01.005.Quantifying relative affinities of Po and Pb in different populations of marine particulate matter is of great importance in utilizing 210Po as a tracer for carbon cycling. We collected and analyzed water samples for the concentrations of dissolved and total 210Po and 30 210Pb from the upper 600 m of the water column at Bermuda Time-series Study site (September 1999 to September 2000) to investigate their seasonality of concentrations and their activity ratio (210Po/210Pb activity ratio, AR). Sinking particles collected in sediment traps at depths of 500 m, 1500 m, and 3200 m from the Oceanic Flux Program (OFP) time-series sediment traps were analyzed over a period of 12 months (May 1999 to May 2000). The objective was to compare the deficiencies of 210Po with respect to 210Pb in the water column to that measured in the sediment traps and to assess the relative affinities of Po and Pb with different particle pools. Inventories of 210Po in the upper 500 m water column varied by a factor of 2, indicating seasonal variations of particulate flux dominated the removal of 210Po. The 210Po/210Pb ARs in the dissolved phase were generally less than the secular equilibrium value (1.0) in the upper 600 m, while were generally greater than 1.0 in the particulate phase, indicating higher removal rates of 210Po relative to 210Pb by particulate matter. The measured fluxes of 210Po and 210Pb in the 500 m, 1500 m, and 3200 m traps increased with depth, while the 210Po/210Pb ARs decreased with depth except from May-August 1999. From the measured fluxes of 210Po and 210Pb at these three traps and the concentrations of 210Po and 210Pb in the water column, this region appears to be a sink for 210Pb which is likely brought-in by lateral advection.GHH’s sabbatical leave was supported by Korea Ocean Research and Development Institute (renamed as Korea Institute of Ocean Science and Technology) (PG47900 and PE98742). The Oceanic Flux Program has been supported since inception by the NSF Chemical Oceanography Program, most recently by grants OCE-0325627/0509602, OCE-0623505 and OCE-0927098. Partial support in writing this manuscript was supported by OCE-0961351 (MB)

Woods Hole Open Access Server

The Susceptibility of Programs to Context Switching

Author: Conte Thomas M.
Hwu Wen-mei W.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/04/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation / MIP-8809478NCR Corp.AMD Corp. 29K Advanced Processor Development DivisionNational Aeronautics and Space Administration / NASA NAG 1-613Office of Naval Research / N00014-88-K-0656Hewlett-Packard Co

Illinois Digital Environment for Access to Learning and Scholarship Repository

Single-Pass Memory System Evaluation for Multiprogramming Workloads

Author: Conte Thomas M.
Hwu Wen-mei W.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1990
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Science Foundation (NSF) / MIP-8809478NCRNational Aeronautics and Space Administration (NASA) / NASA NAG 1-613Office of Naval Research / N00014-88-K-065

Illinois Digital Environment for Access to Learning and Scholarship Repository

NASA Technical Reports Server

Combining Sampling with Single-Pass Techniques for Efficient Cache Simulation

Author: Conte Thomas M.
Hwu Wen-mei W.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/12/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNCRNational Science Foundation / MIP-8809478Hewlett-Packar

Illinois Digital Environment for Access to Learning and Scholarship Repository

Energy-Aware Opcode Design

Author: Balaji V Iyer #1
Jason A Poovey
Thomas M Conte
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-Embedded processors are required to achieve high performance while running on batteries. Thus, they must exploit all the possible means available to reduce energy consumption while not sacrificing performance. In this work, one technique to reduce energy is explored to intelligently design the instructionopcodes of a processor based on a target-workload. The optimization is done using a heuristic that not-only minimizes switching between adjacent instructions, but also simplifies the decoding to reduce latches to save dynamic energy. On average, an optimized opcode is able to be decoded using 40-60% less latches in the decoder. In addition, it is shown that a decoder optimized for algorithms that had similar program structure, similar data-types or similar behavior exhibited consistent patterns of energy reduction. The techniques presented in this paper yield an average 10% reduction in the total dynamic energy. It is also shown that this heuristic can be used to achieve similar results on different issue-width processors. I. MOTIVATION Embedded devices are required to perform several complex tasks that were once attempted by high-performance systems One solution is to take a general-purpose processor and customize it for an embedded system [1]. These embedded processors are simpler than their high-performance counterparts and require significant assistance from the compiler for scheduling, branch-handling etc. However, unlike high-performance systems, wide-availability of compilers, assemblers, and other utilities are limited The first logical step for designing (or choosing) such processors is to define the target application. This ONE target application represents the main workload of this processor. It is generally a good assumption that this target application is one of the most frequently executed applications in this system. If this one target application is able to be run at high performance while consuming less energy, then the overall system energy is reduced. The main concentration of this work is to provide a heuristic for intelligent-design of the instruction opcodes for an embedded processor using one application as the target (or training application). The new-opcode configuration is created by analyzing the code-generator and reducing switching among the adjacent instructions occurring in the target application. The opcodes are designed such that frequently occurring instructions are decoded easily, which reduces the internal decoder power. Unlike previous work, which requires the superset of all benchmarks to be run on the processor to gain any power/energy reduction ( [9] [29]), we prove that one benchmark is enough to provide a significant amount of energy reduction. In addition, we show that an energyefficient opcode-design can reduce energy in the decoder and other stages of the pipelined processor. Finally, we show the effects of processor issue-width scaling on the overall power reduction using this methodology. For this work, the compiler is selected and designed before the processor. Using this design approach, the constraints imposed by the compiler (as shown in section 2) is known ahead of time, and the processor can be designed accordingly. The paper is organized as follows. The related works are explained in section 2. Section 3 gives a brief introduction of a Retargetable code-generator. The experimental framework and the benchmark-set are explained in section 4. Section 5 explains the project methodology. The discussion of results is given in section 6 and the paper is concluded in section 7. II. RELATED WORK Several works have been proposed for power and energy reduction using intelligent opcode-design. To our knowledge, the only work that closely resembles ours is by Benini et al. Cheng and Tyso

CiteSeerX